Inverse Document Frequency and Web Search Engines

نویسندگان

Kevin Prey

James C. French

Allison L. Powell

Charles L. Viles

چکیده

INTRODUCTION Full text searching over a database of moderate size often uses the inverse document frequency, idf = log(N/df), as a component in term weighting functions used for document indexing and retrieval. However, in very large databases (e.g. internet search engines), there is the potential that the collection size (N) could dominate the idf value, decreasing the usefulness of idf as a term weighting component. In this short paper we examine the properties of idf in the context of internet search engines. The observed idf values may also shed light upon the indexed content of the WWW. For example, if the internet search engines we survey index random samples of the WWW, we would expect similar idf values for the same term across the different search engines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...

متن کامل

An Ensemble Click Model for Web Document Ranking

Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...

متن کامل

NovaSearch at TREC 2013 Federated Web Search Track: Experiments with rank fusion

We propose an unsupervised late-fusion approach for the results merging task, based on combining the ranks from all the search engines. Our idea is based on the known pressure for Web search engines to put the most relevant documents at the very top of their ranks and the intuition that relevance of a document should increase as it appears on more search engines [9]. We performed experiments wi...

متن کامل

Keeping Web Indices up-to-date

Search engines play a crucial role in the Web. Without search engines large parts of the Web becomes inaccessible for the majority of users. Search engines can make new and smaller sites accessible at low cost. Without them, other media, such as Television, would be needed to advertise the existence new site on the Web, only large commercial sites can follow this path. The Web would be endanger...

متن کامل

Incorporating Hyperlink Analysis in Web Page Clustering

The size of the World Wide Web is growing rapidly and it has become a very important source of information that can be useful to various academic and commercial applications. However, because of the large number of documents online, it is becoming increasingly difficult to search for useful information on the Web. General-purpose Web search engines, such as Google and AltaVista, present search ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Inverse Document Frequency and Web Search Engines

نویسندگان

چکیده

منابع مشابه

مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابه‌جایی وزن‌دار

An Ensemble Click Model for Web Document Ranking

NovaSearch at TREC 2013 Federated Web Search Track: Experiments with rank fusion

Keeping Web Indices up-to-date

Incorporating Hyperlink Analysis in Web Page Clustering

عنوان ژورنال:

اشتراک گذاری